Near-Optimal Bounds for Cross-Validation via Loss Stability

نویسندگان

  • Ravi Kumar
  • Daniel Lokshtanov
  • Sergei Vassilvitskii
  • Andrea Vattani
چکیده

Multi-fold cross-validation is an established practice to estimate the error rate of a learning algorithm. Quantifying the variance reduction gains due to cross-validation has been challenging due to the inherent correlations introduced by the folds. In this work we introduce a new and weak measure called loss stability and relate the cross-validation performance to this measure; we also establish that this relationship is near-optimal. Our work thus quantitatively improves the current best bounds on cross-validation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability of cross-validation and minmax-optimal number of folds

In this paper, we analyze the properties of cross-validation from the perspective of the stability, that is, the difference between the training error and the error of the selected model applied to any other finite sample. In both the i.i.d. and non-i.i.d. cases, we derive the upper bounds of the one-round and average test error, referred to as the one-round/convoluted Rademacher-bounds, to qua...

متن کامل

Cross-Validation and Mean-Square Stability

k-fold cross validation is a popular practical method to get a good estimate of the error rate of a learning algorithm. Here, the set of examples is first partitioned into k equal-sized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other k − 1 folds. The average error across the k hypotheses is used as an estimate of the error rate. Although widely used, espec...

متن کامل

Optimal Placement and Sizing of Multiple Renewable Distributed Generation Units Considering Load Variations Via Dragonfly Optimization Algorithm

The progression towards smart grids, integrating renewable energy resources, has increased the integration of distributed generators (DGs) into power distribution networks. However, several economic and technical challenges can result from the unsuitable incorporation of DGs in existing distribution networks. Therefore, optimal placement and sizing of DGs are of paramount importance to improve ...

متن کامل

Determining an Economically Optimal (N,C) Design via Using Loss Functions

In this paper, we introduce a new sampling plan based on the defective proportion of batch. The proposed sampling plan is based on the distribution function of the proportion defective. A continuous loss function is used to quantify deviations between the proportion defective and its acceptance quality level (AQL).  For practical purpose, a sensitivity analysis is carried out on the different v...

متن کامل

Near-Infrared Spectroscopic Analysis of Hemoglobin with Stability Based on Human Hemolysates Samples

Near-infrared (NIR) spectroscopy combined with the partial least-squares (PLS) regression was successfully applied for the rapid quantitative analysis of hemoglobin (HGB) based on human hemolysates samples. Based on the varied divisions for the calibration and prediction sets, an effective modeling approach using stable model parameters was proposed. Among 255 samples, 80 were randomly selected...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013